AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsNov-21-2025, 15:51:05 GMT

Fair Clustering Through Fairlets

We study the question of fair clustering under the {\em disparate impact} doctrine, where each protected class must have approximately equal representation in every cluster. We formulate the fair clustering problem under both the k-center and the k-median objectives, and show that even with two protected classes the problem is challenging, as the optimum solution can violate common conventions---for instance a point may no longer be assigned to its nearest cluster center! En route we introduce the concept of fairlets, which are minimal sets that satisfy fair representation while approximately preserving the clustering objective. We show that any fair clustering problem can be decomposed into first finding good fairlets, and then using existing machinery for traditional clustering algorithms. While finding good fairlets can be NP-hard, we proceed to obtain efficient approximation algorithms based on minimum cost flow. We empirically demonstrate the \emph{price of fairness} by quantifying the value of fair clustering on real-world datasets with sensitive attributes.

electronic proceedings, fair clustering, name change, (5 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.84)

Chakraborty, Diptarka, Chatterjee, Kushagra, Das, Debarati, Nguyen, Tien-Long

Generalizing Fair Clustering to Multiple Groups: Algorithms and Applications

arXiv.org Artificial IntelligenceNov-17-2025

Clustering is a fundamental task in machine learning and data analysis, but it frequently fails to provide fair representation for various marginalized communities defined by multiple protected attributes -- a shortcoming often caused by biases in the training data. As a result, there is a growing need to enhance the fairness of clustering outcomes, ideally by making minimal modifications, possibly as a post-processing step after conventional clustering. Recently, Chakraborty et al. [COLT'25] initiated the study of \emph{closest fair clustering}, though in a restricted scenario where data points belong to only two groups. In practice, however, data points are typically characterized by many groups, reflecting diverse protected attributes such as age, ethnicity, gender, etc. In this work, we generalize the study of the \emph{closest fair clustering} problem to settings with an arbitrary number (more than two) of groups. We begin by showing that the problem is NP-hard even when all groups are of equal size -- a stark contrast with the two-group case, for which an exact algorithm exists. Next, we propose near-linear time approximation algorithms that efficiently handle arbitrary-sized multiple groups, thereby answering an open question posed by Chakraborty et al. [COLT'25]. Leveraging our closest fair clustering algorithms, we further achieve improved approximation guarantees for the \emph{fair correlation clustering} problem, advancing the state-of-the-art results established by Ahmadian et al. [AISTATS'20] and Ahmadi et al. [2020]. Additionally, we are the first to provide approximation algorithms for the \emph{fair consensus clustering} problem involving multiple (more than two) groups, thus addressing another open direction highlighted by Chakraborty et al. [COLT'25].

artificial intelligence, dist, machine learning, (17 more...)

2511.11539

Country:

Asia (0.46)
North America > United States (0.28)

Genre: Research Report (0.64)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Machine LearningMay-26-2025

Fair Clustering via Alignment

Kim, Kunwoong, Lee, Jihu, Park, Sangchul, Kim, Yongdai

Algorithmic fairness in clustering aims to balance the proportions of instances assigned to each cluster with respect to a given sensitive attribute. While recently developed fair clustering algorithms optimize clustering objectives under specific fairness constraints, their inherent complexity or approximation often results in suboptimal clustering utility or numerical instability in practice. To resolve these limitations, we propose a new fair clustering algorithm based on a novel decomposition of the fair $K$-means clustering objective function. The proposed algorithm, called Fair Clustering via Alignment (FCA), operates by alternately (i) finding a joint probability distribution to align the data from different protected groups, and (ii) optimizing cluster centers in the aligned space. A key advantage of FCA is that it theoretically guarantees approximately optimal clustering utility for any given fairness level without complex constraints, thereby enabling high-utility fair clustering in practice. Experiments show that FCA outperforms existing methods by (i) attaining a superior trade-off between fairness level and clustering utility, and (ii) achieving near-perfect fairness without numerical instability.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

2505.09131

Country:

Asia > South Korea > Seoul > Seoul (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
North America > Canada (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Setzu, Mattia, Guidotti, Riccardo

Fair Clustering with Clusterlets

arXiv.org Artificial IntelligenceMay-13-2025

Given their widespread usage in the real world, the fairness of clustering methods has become of major interest. Theoretical results on fair clustering show that fairness enjoys transitivity: given a set of small and fair clusters, a trivial centroid-based clustering algorithm yields a fair clustering. Unfortunately, discovering a suitable starting clustering can be computationally expensive, rather complex or arbitrary. In this paper, we propose a set of simple \emph{clusterlet}-based fuzzy clustering algorithms that match single-class clusters, optimizing fair clustering. Matching leverages clusterlet distance, optimizing for classic clustering objectives, while also regularizing for fairness. Empirical results show that simple matching strategies are able to achieve high fairness, and that appropriate parameter tuning allows to achieve high cohesion and low overlap.

artificial intelligence, clusterlet, machine learning, (16 more...)

2505.06259

Country:

North America > Canada > Ontario > Toronto (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Quy, Tai Le, Thanh, Long Le, Hong, Lan Luong Thi, Hopfgartner, Frank

FACROC: a fairness measure for FAir Clustering through ROC curves

arXiv.org Artificial IntelligenceMar-2-2025

Fair clustering has attracted remarkable attention from the research community. Many fairness measures for clustering have been proposed; however, they do not take into account the clustering quality w.r.t. the values of the protected attribute. In this paper, we introduce a new visual-based fairness measure for fair clustering through ROC curves, namely FACROC. This fairness measure employs AUCC as a measure of clustering quality and then computes the difference in the corresponding ROC curves for each value of the protected attribute. Experimental results on several popular datasets for fairness-aware machine learning and well-known (fair) clustering models show that FACROC is a beneficial method for visually evaluating the fairness of clustering models.

clustering, dataset, fairness measure, (11 more...)

2503.00854

Country:

Asia > Vietnam > Hanoi > Hanoi (0.05)
Europe > United Kingdom > England > South Yorkshire > Sheffield (0.04)
Europe > Germany (0.04)
Asia > Taiwan (0.04)

Genre: Research Report (0.64)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Neural Information Processing SystemsOct-11-2024, 07:09:13 GMT

Fair Clustering Under a Bounded Cost

Clustering is a fundamental unsupervised learning problem where a dataset is partitioned into clusters that consist of nearby points in a metric space. A recent variant, fair clustering, associates a color with each point representing its group membership and requires that each color has (approximately) equal representation in each cluster to satisfy group fairness. The relative increase in the cost, the ''price of fairness,'' can indeed be unbounded. Therefore, in this paper we propose to treat an upper bound on the clustering objective as a constraint on the clustering problem, and to maximize equality of representation subject to it. We consider two fairness objectives: the group utilitarian objective and the group egalitarian objective, as well as the group leximin objective which generalizes the group egalitarian objective.

algorithm, egalitarian objective, objective, (6 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsOct-8-2024, 04:23:14 GMT

Reviews: Fair Clustering Through Fairlets

While there is a growing body of work on fair supervised learning, here the authors present a first exploration of fair unsupervised learning by considering fair clustering. Each data point is labeled either red or blue, then an optimal k-clustering is sought which respects a given fairness criterion specified by a minimum threshold level of balance in each cluster. Balance lies in [0,1] with higher values corresponding to more equal numbers of red and blue points in every cluster. This is an interesting and natural problem formulation - though a more explicit example of exactly how this might be useful in practice would be helpful. For both k-center and k-median versions of the problem, it is neatly shown that fair clustering may be reduced to first finding a good'fairlet' decomposition and then solving the usual clustering problem on the centers of each fairlet.

fair clustering, fairlet, review, (1 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.42)

Dickerson, John, Esmaeili, Seyed A., Morgenstern, Jamie, Zhang, Claire Jie

Fair Clustering: Critique, Caveats, and Future Directions

arXiv.org Artificial IntelligenceJun-22-2024

Clustering is a fundamental problem in machine learning and operations research. Therefore, given the fact that fairness considerations have become of paramount importance in algorithm design, fairness in clustering has received significant attention from the research community. The literature on fair clustering has resulted in a collection of interesting fairness notions and elaborate algorithms. In this paper, we take a critical view of fair clustering, identifying a collection of ignored issues such as the lack of a clear utility characterization and the difficulty in accounting for the downstream effects of a fair clustering algorithm in machine learning settings. In some cases, we demonstrate examples where the application of a fair clustering algorithm can have significant negative impacts on social welfare. We end by identifying a collection of steps that would lead towards more impactful research in fair clustering.

algorithm, application, fairness, (12 more...)

2406.1596

Country:

Europe > Austria > Vienna (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > North Carolina (0.04)
(7 more...)

Genre:

Overview (0.67)
Research Report (0.50)

Industry:

Law (0.92)
Education (0.67)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

arXiv.org Artificial IntelligenceApr-20-2023

Deep Fair Clustering via Maximizing and Minimizing Mutual Information: Theory, Algorithm and Metric

Zeng, Pengxin, Li, Yunfan, Hu, Peng, Peng, Dezhong, Lv, Jiancheng, Peng, Xi

Fair clustering aims to divide data into distinct clusters while preventing sensitive attributes (\textit{e.g.}, gender, race, RNA sequencing technique) from dominating the clustering. Although a number of works have been conducted and achieved huge success recently, most of them are heuristical, and there lacks a unified theory for algorithm design. In this work, we fill this blank by developing a mutual information theory for deep fair clustering and accordingly designing a novel algorithm, dubbed FCMI. In brief, through maximizing and minimizing mutual information, FCMI is designed to achieve four characteristics highly expected by deep fair clustering, \textit{i.e.}, compact, balanced, and fair clusters, as well as informative features. Besides the contributions to theory and algorithm, another contribution of this work is proposing a novel fair clustering metric built upon information theory as well. Unlike existing evaluation metrics, our metric measures the clustering quality and fairness as a whole instead of separate manner. To verify the effectiveness of the proposed FCMI, we conduct experiments on six benchmarks including a single-cell RNA-seq atlas compared with 11 state-of-the-art methods in terms of five metrics. The code could be accessed from \url{ https://pengxi.me}.

data mining, information, machine learning, (18 more...)

2209.12396

Country: Asia > China (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Information Management (0.82)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
(2 more...)